Evans
TaleFrame: An Interactive Story Generation System with Fine-Grained Control and Large Language Models
Wang, Yunchao, Sun, Guodao, Fu, Zihang, Liu, Zhehao, Du, Kaixing, Gao, Haidong, Liang, Ronghua
With the advancement of natural language generation (NLG) technologies, creative story generation systems have gained increasing attention. However, current systems often fail to accurately translate user intent into satisfactory story outputs due to a lack of fine-grained control and unclear input specifications, limiting their applicability. To address this, we propose TaleFrame, a system that combines large language models (LLMs) with human-computer interaction (HCI) to generate stories through structured information, enabling precise control over the generation process. The innovation of TaleFrame lies in decomposing the story structure into four basic units: entities, events, relationships, and story outline. We leverage the Tinystories dataset, parsing and constructing a preference dataset consisting of 9,851 JSON-formatted entries, which is then used to fine-tune a local Llama model. By employing this JSON2Story approach, structured data is transformed into coherent stories. TaleFrame also offers an intuitive interface that supports users in creating and editing entities and events and generates stories through the structured framework. Users can control these units through simple interactions (e.g., drag-and-drop, attach, and connect), thus influencing the details and progression of the story. The generated stories can be evaluated across seven dimensions (e.g., creativity, structural integrity), with the system providing suggestions for refinement based on these evaluations. Users can iteratively adjust the story until a satisfactory result is achieved. Finally, we conduct quantitative evaluation and user studies that demonstrate the usefulness of TaleFrame. Dataset available at https://huggingface.co/datasets/guodaosun/tale-frame.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > United States > Colorado > Weld County > Evans (0.04)
- Asia > Singapore (0.04)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.47)
Examining the Usage of Generative AI Models in Student Learning Activities for Software Programming
Chen, Rufeng, Jiang, Shuaishuai, Shen, Jiyun, Moon, AJung, Wei, Lili
Abstract--The rise of Generative AI (GenAI) tools like Chat-GPT has created new opportunities and challenges for computing education. Existing research has primarily focused on GenAI's ability to complete educational tasks and its impact on student performance, often overlooking its effects on knowledge gains. In this study, we investigate how GenAI assistance compares to conventional online resources in supporting knowledge gains across different proficiency levels. We conducted a controlled user experiment with 24 undergraduate students of two different levels of programming experience (beginner, intermediate) to examine how students interact with ChatGPT while solving programming tasks. We analyzed task performance, conceptual understanding, and interaction behaviors. Our findings reveal that generating complete solutions with GenAI significantly improves task performance, especially for beginners, but does not consistently result in knowledge gains. Importantly, usage strategies differ by experience: beginners tend to rely heavily on GenAI toward task completion often without knowledge gain in the process, while intermediates adopt more selective approaches. We find that both over-reliance and minimal use result in weaker knowledge gains overall. Based on our results, we call on students and educators to adopt GenAI as a learning rather than a problem solving tool. Our study highlights the urgent need for guidance when integrating GenAI into programming education to foster deeper understanding. The rapid development of Generative Artificial Intelligence (GenAI) has led to its widespread adoption across various domains to boost productivity and streamline workflows. Large Language Models (LLMs), such as OpenAI's ChatGPT and Codex, Google Gemini, and GitHub Copilot, have been integrated into domains including software engineering [1], [2], healthcare [3], education [4], creative writing [5], [6], and digital music [7], offering capabilities such as code generation, question answering, and image generation. These authors contributed equally to this work. Some studies evaluated GenAI's performance on programming tasks [8], user interface design education [9], and computer vision coursework [10]. Others focused on assessing the accuracy and usability of GenAIgenerated responses [11], [12].
- North America > Canada > Quebec > Montreal (0.15)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Colorado > Weld County > Evans (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education > Curriculum > Subject-Specific Education (0.68)
- Education > Educational Setting (0.48)
- Education > Educational Technology (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models
There are two updating strategies: 1) mimicking strategy to generate similar samples based on original data, preserving stylistic and contextual essence, and 2) extending strategy that further expands existing samples at varying cognitive levels by adapting Bloom's taxonomy of educational objectives.
- North America > United States > Mississippi (0.04)
- Asia > Singapore (0.04)
- North America > United States > Colorado > Weld County > Evans (0.04)
- (3 more...)
- Education (0.88)
- Information Technology (0.67)
- Leisure & Entertainment > Sports > Basketball (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.92)
Interaction Dynamics as a Reward Signal for LLMs
Gooding, Sian, Grefenstette, Edward
The alignment of Large Language Models (LLMs) for multi-turn conversations typically relies on reward signals derived from the content of the text. This approach, however, overlooks a rich, complementary source of signal: the dynamics of the interaction itself. This paper introduces TRACE (Trajectory-based Reward for Agent Collaboration Estimation), a novel reward signal derived from the geometric properties of a dialogue's embedding trajectory--a concept we term 'conversational geometry'. Our central finding is that a reward model trained only on these structural signals achieves a pairwise accuracy (68.20%) comparable to a powerful LLM baseline that analyzes the full transcript (70.04%). Furthermore, a hybrid model combining interaction dynamics with textual analysis achieves the highest performance (80.17%), demonstrating their complementary nature. This work provides strong evidence that for interactive settings, how an agent communicates is as powerful a predictor of success as what it says, offering a new, privacy-preserving framework that not only aligns agents but also serves as a diagnostic tool for understanding the distinct interaction patterns that drive successful collaboration.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Colorado > Weld County > Evans (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Video
Kyaw, Alexander Htet, Sivalingam, Lenin Ravindranath
We present a node-based storytelling system for multimodal content generation. The system represents stories as graphs of nodes that can be expanded, edited, and iteratively refined through direct user edits and natural-language prompts. Each node can integrate text, images, audio, and video, allowing creators to compose multimodal narratives. A task selection agent routes between specialized generative tasks that handle story generation, node structure reasoning, node diagram formatting, and context generation. The interface supports targeted editing of individual nodes, automatic branching for parallel storylines, and node-based iterative refinement. Our results demonstrate that node-based editing supports control over narrative structure and iterative generation of text, images, audio, and video. We report quantitative outcomes on automatic story outline generation and qualitative observations of editing workflows. Finally, we discuss current limitations such as scalability to longer narratives and consistency across multiple nodes, and outline future work toward human-in-the-loop and user-centered creative AI tools.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Colorado > Weld County > Evans (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.68)
Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models
There are two updating strategies: 1) mimicking strategy to generate similar samples based on original data, preserving stylistic and contextual essence, and 2) extending strategy that further expands existing samples at varying cognitive levels by adapting Bloom's taxonomy of educational objectives.
- North America > United States > Mississippi (0.04)
- Asia > Singapore (0.04)
- North America > United States > Colorado > Weld County > Evans (0.04)
- (3 more...)
- Education (0.88)
- Information Technology (0.67)
- Leisure & Entertainment > Sports > Basketball (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.92)
CML-Bench: A Framework for Evaluating and Enhancing LLM-Powered Movie Scripts Generation
Zheng, Mingzhe, Song, Dingjie, Zhou, Guanyu, You, Jun, Zhan, Jiahao, Ma, Xuran, Song, Xinyuan, Lim, Ser-Nam, Chen, Qifeng, Yang, Harry
Large Language Models (LLMs) have demonstrated remarkable proficiency in generating highly structured texts. However, while exhibiting a high degree of structural organization, movie scripts demand an additional layer of nuanced storytelling and emotional depth-the 'soul' of compelling cinema-that LLMs often fail to capture. To investigate this deficiency, we first curated CML-Dataset, a dataset comprising (summary, content) pairs for Cinematic Markup Language (CML), where 'content' consists of segments from esteemed, high-quality movie scripts and 'summary' is a concise description of the content. Through an in-depth analysis of the intrinsic multi-shot continuity and narrative structures within these authentic scripts, we identified three pivotal dimensions for quality assessment: Dialogue Coherence (DC), Character Consistency (CC), and Plot Reasonableness (PR). Informed by these findings, we propose the CML-Bench, featuring quantitative metrics across these dimensions. CML-Bench effectively assigns high scores to well-crafted, human-written scripts while concurrently pinpointing the weaknesses in screenplays generated by LLMs. To further validate our benchmark, we introduce CML-Instruction, a prompting strategy with detailed instructions on character dialogue and event logic, to guide LLMs to generate more structured and cinematically sound scripts. Extensive experiments validate the effectiveness of our benchmark and demonstrate that LLMs guided by CML-Instruction generate higher-quality screenplays, with results aligned with human preferences.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Colorado > Weld County > Evans (0.04)
- (3 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
CreAgentive: An Agent Workflow Driven Multi-Category Creative Generation Engine
Cheng, Yuyang, Cai, Linyue, Peng, Changwei, Xu, Yumiao, Bie, Rongfang, Zhao, Yong
We present CreAgentive, an agent workflow driven multi-category creative generation engine that addresses four key limitations of contemporary large language models in writing stories, drama and other categories of creatives: restricted genre diversity, insufficient output length, weak narrative coherence, and inability to enforce complex structural constructs. At its core, CreAgentive employs a Story Prototype, which is a genre-agnostic, knowledge graph-based narrative representation that decouples story logic from stylistic realization by encoding characters, events, and environments as semantic triples. CreAgentive engages a three-stage agent workflow that comprises: an Initialization Stage that constructs a user-specified narrative skeleton; a Generation Stage in which long- and short-term objectives guide multi-agent dialogues to instantiate the Story Prototype; a Writing Stage that leverages this prototype to produce multi-genre text with advanced structures such as retrospection and foreshadowing. This architecture reduces storage redundancy and overcomes the typical bottlenecks of long-form generation. In extensive experiments, CreAgentive generates thousands of chapters with stable quality and low cost (less than $1 per 100 chapters) using a general-purpose backbone model. To evaluate performance, we define a two-dimensional framework with 10 narrative indicators measuring both quality and length. Results show that CreAgentive consistently outperforms strong baselines and achieves robust performance across diverse genres, approaching the quality of human-authored novels.
- North America > United States > Colorado > Weld County > Evans (0.04)
- Asia > China > Beijing > Beijing (0.04)
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
Chakrabarty, Tuhin, Laban, Philippe, Wu, Chien-Sheng
AI-generated text is proliferating across domains, from creative writing and journalism to marketing content and scientific articles. Models can follow user-provided instructions to generate coherent and grammatically correct outputs but in this work, we study a more fundamental question: how do we evaluate and improve the writing quality of AI-generated text? Writing quality assessment has received less attention from the community, in part because it is fundamentally subjective and requires expertise. We first introduce the Writing Quality Benchmark (WQ) by consolidating five writing-preference datasets into 4,729 writing quality judgments. Our experiments show that most of the competitive baselines, including state-of-the-art LLMs that excel at reasoning tasks, barely outperform random baselines on WQ. We then train specialized Writing Quality Reward Models (WQRM) of various sizes for writing quality assessment that demonstrate strong generalization on four out-of-distribution test sets and 74% accuracy on the WQ benchmark. To further show WQRM's practical benefits during inference, we leverage additional test-time compute to generate and rank multiple candidate revisions, allowing us to select higher-quality outputs from an initial draft. Human evaluation with 9 experienced writers confirm that WQRM-based selection produces writing samples preferred by experts 66% overall, and 72.2% when the reward gap is larger than 1 point. We release our datasets and models to encourage community engagement with writing quality assessment and development of AI writing systems better aligned with human preferences.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Colorado > Weld County > Evans (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report (1.00)
- Personal > Honors (0.46)
Variational volume reconstruction with the Deep Ritz Method
Rowan, Conor, Soman, Sumedh, Evans, John A.
We present a novel approach to variational volume reconstruction from sparse, noisy slice data using the Deep Ritz method. Motivated by biomedical imaging applications such as MRI-based slice-to-volume reconstruction (SVR), our approach addresses three key challenges: (i) the reliance on image segmentation to extract boundaries from noisy grayscale slice images, (ii) the need to reconstruct volumes from a limited number of slice planes, and (iii) the computational expense of traditional mesh-based methods. We formulate a variational objective that combines a regression loss designed to avoid image segmentation by operating on noisy slice data directly with a modified Cahn-Hilliard energy incorporating anisotropic diffusion to regularize the reconstructed geometry. We discretize the phase field with a neural network, approximate the objective at each optimization step with Monte Carlo integration, and use ADAM to find the minimum of the approximated variational objective. While the stochastic integration may not yield the true solution to the variational problem, we demonstrate that our method reliably produces high-quality reconstructed volumes in a matter of seconds, even when the slice data is sparse and noisy.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- North America > United States > Colorado > Weld County > Evans (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- (2 more...)
- Research Report > New Finding (0.67)
- Research Report > Promising Solution (0.48)
- Health & Medicine > Diagnostic Medicine > Imaging (0.88)
- Government > Military (0.67)